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We report on our on-going project to compute mesonic and baryonic two- and three-point cor- 
relation functions in simulations using iVf = 2 flavours of O(a) improved Wilson quarks and the 
Wilson plaquette action. We present performance figures for the DD-HMC algorithm on com- 
modity cluster hardware and discuss the issue of critical slowing down, which is particularly 
pronounced for the topological charge. The effectiveness of stochastic noise sources and Jacobi 
smearing are investigated. Our preliminary results obtained at three quark masses on 96 x 48 3 at 
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runs is around 360 MeV, which corresponds to m^L = 5.3. 
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1. Introduction 

Despite the fact that there has been enormous recent activity in simulating lattice QCD with 
dynamical quarks, the continuum limit is still poorly understood. There are few systematic scaling 
studies of hadronic quantities, and many results for phenomenologically interesting observables 
have been obtained at one or two values of the lattice spacing only. As far as control over cutoff 
effects is concerned, simulations with dynamical quarks have not yet reached the same maturity 
compared to the quenched approximation. As the latter is being abandoned, one runs the risk 
of replacing one systematic effect (quenching) by another. The need for having full control over 
all systematics is further highlighted by the fact that lattice results are increasingly important for 
providing constraints on the validity of the Standard Model. 

The work presented here is part of the CLS ("Coordinated Lattice Simulations") project [fi]], 
which is aimed at generating a set of ensembles for QCD with two dynamical flavours for a variety 
of lattice spacings (a sa 0.04, 0.06, 0.08 fm) and volumes, such that the continuum limit can be taken 
in a controlled manner. Non-perturbatively 0(a) improved Wilson quarks are used to discretise the 
quark action. In order to tune the masses of the light quarks towards their physical values whilst 
keeping the numerical effort in the simulations at a manageable level, we employ the deflation- 
accelerated DD-HMC algorithm [§]. 

2. Production runs 

The production runs which we carry out as part of the CLS effort are performed on the clus- 
ter platform "Wilson" at the University of Mainz, which is exclusively used for lattice QCD [Q]. 
It comprises 280 compute nodes, each equipped with two AMD 2356 "Barcelona" processors, 
clocked at 2.3 GHz. Each core has one GByte of memory so that the cluster's total memory 
amounts to 2.24 TBytes. Communication between nodes is realised via an Infmiband network and 
switch (DDR 20+20 Gb/s full duplex). The compute nodes are placed in water-cooled server racks. 
Benchmarks based on typical QCD applications [Q] show that the cluster's sustained performance 
scales up to 3.6 TFlops, depending on the local system size. Considering the procurement costs 
of 1.1 M€ thus implies a cost-effectiveness of about 0.30€/MFlops (sustained). The ratio of the 
required cooling capacity per compute speed amounts to 20 kW/TFlops. 

We have generated configurations at /3 = 5.5 on lattices of size 96 -48 3 . Following [g| we set 
the coefficient of the Sheikholeslami-Wohlert term to c sw = 1.75150. On our cluster platform, we 
ran at three values of the hopping parameter simultaneously, using 576 processor cores per job. The 
length of one HMC trajectory was set to % = 0.5, and the block size chosen as 8 2 x 12 2 . Further 
information on simulation parameters and performance figures is listed in Table [l|. At each value 
of K several thousand trajectories were generated for thermalisation. 

The Monte Carlo history of the average plaquette at our largest quark mass is shown in Fig. []]. 
For the first 3000 trajectories a small trend in the data is observed, which is attributed to insuf- 
ficient thermalisation. Similar observations were made at the other quark masses, and hence we 
discarded the first 3000 trajectories in each run. Following the method in ref. [0], the integrated 
autocorrelation time for the plaquette was determined, and the resulting values are listed in Table [[]. 
We stored configurations after every 16th trajectory on disk, which leaves us with more than 600 
configurations at each quark mass. 
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Run 


Lattice 


Block size K 


no, n u n 2 


time/traj. 


p 

1 acc 




T int [plaq] 


N3 


96 x 48 3 


8 2 xl2 2 0.13640 


4, 5, 16 


763 s 


0.85 


13761 


16(3) 


N4 




0.13650 


4, 5, 20 


943 s 


0.87 


13104 


14(2) 


N5 




0.13660 


4, 5, 24 


1262 s 


0.86 


12419 


16(3) 



Table 1: Run parameters at j3 = 5.5. We list the number of steps used in the hierarchical integration 
schemes [0], the average CPU time per trajectory, acceptance rate f acc and the total number of trajectories, 
N tr , generated in each run. The last column contains the integrated autocorrelation time of the average 
plaquette, obtained after discarding the first 3000 trajectories in each run. 
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Figure 1: Left: Monte Carlo history of the average plaquette for run N3. Trajectories to the left of the 
vertical dashed line were discarded; Right: Autocorrelation functions of the average plaquette obtained after 
discarding the first 3000 trajectories. 




toDoloaical charae 

Figure 2: Left: Monte Carlo history of the topological charge for run N3; Right: Distribution of the 
topological charge after discarding the first 3000 trajectories. 

The simulations performed as part of the CLS project revealed a severe case of critical slowing 
down in the topological charge, which manifests itself in a steep rise of the associated autocorre- 
lation time as a function of the lattice spacing. In particular, it was observed [Q] that at j3 =5.7 
(which corresponds to a lattice spacing of a 0.04 fm), tunnelling between topological sectors is 
strongly suppressed. In Fig. ^ we plot the Monte Carlo history for run N3 of the topological charge, 
Q = a 4 £ x tr [F(x)F(x)]/(\6n 2 ). With the exception of the first 3-4000 trajectories, the topological 
charge does fluctuate around zero at a sizeable rate and produces a distribution which is reasonably 
symmetric. Similar observations were made at the other values of the quark masses used in our 
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simulations. Thus, unlike the situation encountered at the larger /3-value of 5.7 [|7|] the topological 
charge does not appear to be stuck in a particular sector. While this may be accidental, we can take 
confidence that the composition of our ensembles is apparently not strongly biased. We stress that 
critical slowing is a general problem for lattice simulations near the continuum, which calls for a 
radical treatment like the one proposed in [gj. 



3. Mesonic and baryonic two-point functions 

The most widely used procedure to compute quark propagators is the source method, which 
amounts to solving the linear system 

DO = 77, (3.1) 

where D is the lattice Dirac operator and r\ a source vector. If 77 is chosen to be a point source, the 
resulting hadron correlators can be quite noisy, with the exception of the simplest channels such as 
the pion. An unambiguous identification of the asymptotic behaviour is then quite difficult. It is not 
only desirable to reduce the level of statistical noise but also to enhance the spectral weight of the 
desired state in the spectral decomposition of the correlator. In our simulations we have addressed 
the first problem by comparing different stochastic noise sources In particular, we have imple- 
mented the generalised "one-end-trick" Q1Q]. In order to enhance and tune the projection properties 
of interpolating operators, we have implemented several variants of Jacobi smearing [11 1. 
Let T7 be a random noise vector which satisfies 

«T?£W<(y)» = 8^(x-y)8 ah 8 ap , (3.2) 

where double brackets denote the stochastic average. The two-point correlation function of a quark 
bilinear, 0{x) = (xj/Txi/) (x) is given by 

£<OWOCy) f ) = " (Tr{S(x,y)r Y 5S(x,y^Y 5 f }> , (3.3) 

where f = YoT^Yq. The generalised one-end-trick amounts to choosing a spin-diagonal random 
source vector. More specifically, the noise source has support only on a particular spin component 
T and timeslice (e.g. yo = 0), viz. 

n h a {y) = e(y)8oy a 8ar, ((<f G0<T (?)» = s {3) (y-?)S bc - (3-4) 

Solving the linear system, eq. (|3~I|), for spin component T yields the solution vector <I>, i.e. 

KA X ) = EE^.y)L=o«*(50- (3-5) 

y b 

The correlation function is then obtained as 

^(o(x)o(yy) = -h^((\(r r5 Mx)'} a Uy&*(x)] a »), (3.6) 

x,y \ x a,a,T I 

For every "hit", i.e. every choice of random source one must perform four inversions, one for each 
spin component X. Compared with the point source, the numerical effort is reduced by a factor 
three per hit. 
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Figure 3: Left: Effective masses in the pion channel obtained using a point source and spin-diagonal 
Z2 <8> Z2 noise source at fixed numerical cost; Right: Effective masses for the nucleon computed on lattice N4. 



In our project we have chosen Z2 (%>Z2 noise for the sources In Fig.|] we compare the 
statistical signal for the conventional point source to the generalised one-end trick for three hits, 
such that the numerical cost for computing correlators for the two source types is identical. It is 
seen that in the pion channel random noise sources can lead to a significant enhancement of the 
statistical signal. Further studies showed that a similar improvement is, unfortunately, not observed 
in the vector channel. For baryons, we used the method of ref. [12], but without explicit low-mode 
averaging. Here, in order to reach a given statistical accuracy, the numerical effort was at least as 
large as for point sources, even after trying various dilution schemes [13], and therefore we found 
the method to be practically useless for the determination of baryonic ground state masses. 

In order to enhance the projection onto the ground state in a given channel, particularly for 
baryons, we have implemented Jacobi smearing JTlQ, supplemented by "fat" link variables. The 
latter were obtained either via the APE [ 14 ] or via the HYP [|l5|] procedure. While we found much 
better plateaus when using smeared links of either type, HYP smearing appears to have a slight 
advantage. In Fig. || we compare effective mass plots for the nucleon, computed using point and 
HYP- Jacobi smeared sources. It is seen that not only the contribution of excited states is reduced 
but that also the plateau extends to larger timeslices if HYP-Jacobi smearing is applied, although 
there may be room for further improvement via better tuning of the smearing parameters. 



4. Setting the scale 

In order to convert the pion masses computed on our ensembles into physical units, we must 
set the scale. The mass of the CI baryon is very well suited for this purpose, since the £2 is stable in 
QCD and because it contains only strange quarks in the valence sector. A long chiral extrapolation 
in the valence quark mass can thus be avoided. For a reliable determination of the mass of the Q, 
however, our simulations and analyses are not yet advanced enough. In order to obtain preliminary 
values for the lattice scale, we have therefore resorted to using the mass of the A'* -meson. 

To this end we have followed the procedure outlined in [jl^] : we have determined the masses of 
pseudoscalar and vector mesons for degenerate and non-degenerate combinations of quarks, where 
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Figure 4: The ratio (mk / niK* ) 2 as a function of (owir-) 2 for the three data sets. The horizontal dashed lines 
denotes the physical ratio mjf/mx* = 0.554. 



Run 


m n L 


m K [MeV] 


N3 


7.680(15) 


524(13) 


N4 


6.540(18) 


446(11) 


N5 


5.306(23) 


362( 9) 



Table 2: Preliminary results for pion masses in physical units. 

one of the masses was fixed to coincide with the sea quark mass. We denote the masses of the 
generic non-degenerate pseudoscalar and vector mesons by and mk*, respectively. Their values 
were obtained from single-exponential fits to the corresponding correlation functions, where the 
latter were computed using stochastic sources. The first step in the scale-setting procedure consists 
of interpolating the ratio (m^/wjf*) 2 as a function of (ara/j) 2 to the experimentally observed value 
of nix/mfc* = 0.554. Fig. |] shows the data points for the three ensembles. The intersection of the 
fit curves with the horizontal dashed line determines the kaon mass in lattice units, am^, and thus 
fixes the bare mass of the strange quark at a given value of the sea quark mass. In the second 
step one interpolates aniK in the sea quark mass to the reference value m n jm^ = 0.85. Obviously, 
this value does not correspond to the physical situation. However, as explained in [[T(J, it serves 
as a perfectly well-defined reference point, which is sufficient for comparing data on different 
ensembles. The kaon mass at the reference point is determined as am^| re f = 0.1512(38). After 
inserting the physical (isospin-averaged) kaon mass of 495 MeV, one obtains a = 0.0603(15) fm. 
This value can then be used to convert the pion masses on the various ensembles into physical units, 
which yields the values listed in Table ^, where the combination m n L is shown as well. 

5. Conclusions 

Our studies have shown that large lattices at fine resolution can be simulated efficiently on 
commodity clusters. In spite of a sharp increase in the autocorrelation time of the topological 
charge observed at even smaller lattice spacings [|7|], the distributions for this quantity obtained in 
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our runs are not pathological. We plan to compute two- and three-point correlation functions for 
mesonic and baryonic states in order to determine a variety of observables. So far our minimum 
pion mass is about 360 MeV, and m % L is kept larger than 5. Lowering the quark mass further, in 
order to access pion masses of less than 300 MeV would necessitate going to larger lattice sizes, if 
one wants to maintain the condition m n L > 3. 

With the currently available algorithms, i.e. while a satisfactory solution to the problem of 
critical slowing down is still under investigation, it is not worth investing more effort into the 
generation of ensembles with smaller lattice spacings. 
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